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Abstract 

New non-asymptotic random coding theorems (with error probabiHty e and finite block length n) 
based on Gallager parity check ensemble and Shannon random code ensemble with a fixed codeword type 
are established for discrete input arbitrary output channels. The resulting non-asymptotic achievabiUty 
bounds, when combined with non-asymptotic equipartition properties developed in the paper, can be 
easily computed. Analytically, these non-asymptotic achievabiUty bounds are shown to be asymptotically 
tight up to the second order of the coding rate as n goes to infinity with either constant or sub- 
exponentially decreasing e. Numerically, they are also compared favourably, for finite n and e of practical 
interest, with existing non-asymptotic achievabiUty bounds in the literature in general. 

Index Terms 

Channel capacity, non-asymptotic coding theorems, non-asymptotic equipartition properties, random 
linear codes, Gallager parity check ensemble. Shannon random code, type. 
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I. Introduction 

Recently, there have been great research interests in non-asymptotic channel coding theorems 
in information theory. By non-asymptotic coding theorems, we mean tight lower and upper 
bounds on the rate of certain codes or code ensembles in the regime of finite block length n 
(typically ranging from hundreds to thousands) and (word) error probability e (typically ranging 
from 10^^ to 10^^), which is loosely referred to hereafter as the non- asymptotic regime. For 
example, several non- asymptotic achievability bounds on Shannon random code ensemble have 
been reported in [fT|, which, coupled with non-asymptotic converse theorems therein, were shown 
to be very tight by numeric calculation in the non-asymptotic regime for some special channels 
such as a binary symmetric channel (BSC), a binary erasure channel (BEC), and an additive 
white gaussian noise (AWGN) channel. 

Following [1], we are motivated in this paper to investigate if similar tight bounds are still 
valid for some structured ensembles and general memoryless channels with finite input alphabet 
and arbitrary output alphabet. Of particular interest is Gallager parity check ensemble [|2|, in 
which each element of the parity check matrix of a (linear) code is independently and uniformly 
generated from the finite field input alphabet. Note that for Gallager parity check ensemble, 
codewords are not pairwise independent, and therefore, bounding techniques on Shannon random 
code ensemble can not be applied in general. 

Let P = {p{y\x),x G X,y E y} he a channel with binary input alphabet X. The channel P 
is said to be memoryless binary-input output- symmetric (MBIOS) if the transition probability 
distribution of the channel satisfies p{y\0) = p(— for any y E y. In the literature, several 
non- asymptotic achievability bounds of linear codes have been developed for MBIOS channels. 
They more or less followed the approach invented by Gallager in [2|. Specifically, given a linear 
code Cn and a transmitted codeword c", the channel output space 3^" is divided into two parts 
3^^ (a bad region) and y^ (a good region); the error probability (conditioned on the codeword 
c") then is bounded as follows 

Pe{Cn\n < Pr{r"G3^,"|X" = c"} 

+ Pr {error, E 3^;|X" = c"} ; (1.1) 

and the union bound with respect to all codewords other than c" is then applied to the second 
probability term. Using chernoff bounds [j3|, Gallager [2] then derived an achievability bound for 
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any deterministic code of block length n with respect to its Hamming weight profile {N{1)}'^^^, 
where N{1) is the number of codewords with Hamming weight /, and further showed that 
substituting {N{1)}'^^^ in this achievability bound with the average Hamming weight profile of 
Gallager parity check ensemble yields a bound equal to the Error Exponent bound for Shannon 
random code ensemble in [j4|, multiplied by a non-exponential terrrQ For some special MBIOS 
channels, analysis of those two probabilities in ( |1.1| ) can be further refined. Particularly, yj^ can 
be properly selected such that the exact calculation of the first probability is feasible for any 
finite block length, while for the second probability, the union bound can be applied conditioned 
on channel noise. Well known results along this line include those of Poltyrev [|6| for a BSC and 
binary input additive Gaussian channel (BIAGC). For BSCs, it was shown in [[T| that Poltyrev's 
bound on Gallager parity check ensemble turns out to be the tightest achievability bound in the 
non- asymptotic regime among all non- asymptotic achievabilities on BSCs in the literature. For 
BIAGCs, however, it was shown [6] that the corresponding bound (i.e.. Tangential Sphere Bound 
(TSB)), applied to Gallager parity check ensemble, does not yield the same error exponent as that 
of Shannon random code ensemble (especially when the coding rate is close to Shannon capacity 
of the channel), and therefore would be expected to be worse than Error Exponent bound in 
the non- asymptotic regime. To the best of our knowledge, for general MBIOS channels. Error 
Exponent bound remains the tightest achievability on Gallager parity check ensemble; it is also 
efficiently computable. 

In this paper, a new non- asymptotic achievability bound is proved for Gallager parity check 
ensemble, which is applicable to any binary input memoryless channe|^ (BIMC). For some 
special channels such as BSCs and BECs, this bound can be calculated exactly, and is shown 
(both analytically and numerically) to be almost the same as Dependence Testing bound in [1|. 
When combined with non- asymptotic equipartition property developed in the appendices of the 
paper, the new bound can be efficiently evaluated for any BIMCs, including those with continuous 
output such as BIAGCs. Asymptotic analysis then shows that the new bound is tight up to the 

*This result on Gallager parity check ensemble was later enhanced by Shulman and Feder Isj, who showed that the non- 
exponential term could be further eliminated. 

^Our new non-asymptotic achievability bound is also applicable to any memoryless channel with a finite field input alphabet. 
To facilitate our discussion, however, we choose to focus on the case of binary input alphabet when Gallager parity check 
ensemble is considered. 
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second order of the coding rate on any BIMC with certain symmetry as n goes to infinity 
with either constant or subexponentially decreasing e. Numeric calculation on BIAGCs shows 
that the bound is tighter than TSB and Error Exponent bound in the non- asymptotic regime. 
Therefore, compared to Error Exponent bound, the tightest achievability bound (reported before 
in the literature) on Gallager parity check ensemble which is computable for general MBIOS 
channels, our achievability bound is more general (applicable to and computable for any BIMC 
with or without any symmetry) and tighter in the non- asymptotic regime. 

Our bounding technique can be also applied to Shannon random code ensemble with a fixed 
codeword type on any discrete input memoryless channel (DIMC), in which each codeword 
is independently and uniformly generated from the set of sequences with the same type. The 
resulting achievability bound can be linked to k(3 bound, one of the tightest achievability bounds 
in the literature, proved in [jJJ by a deterministically constructed code. Then an easy-to-compute 
version of the bound is yielded by applying non- asymptotic equipartition property, and is shown 
again to be tight up to the second order of the coding rate for any DIMC as n goes to infinity with 
either constant or subexponentially decreasing e. Numerical calculation on Z channels shows that 
this achievability bound is tighter than Error Exponent bounds on Shannon random code with and 
without type constraint, derived by Fano [7] and Gallager [|4] respectively, in the non- asymptotic 
regime. 

The rest of the paper is organized as follows. Non- asymptotic coding theorems for Gallager 
parity check ensemble on BIMCs and their asymptotic results are presented in Section |ll} while 
their counterparts for Shannon random code ensemble with a fixed codeword type on DIMCs are 
presented in Section III Proofs of those theorems in Sections [II] and III are divided into Sections 



IV VII[ Section VIII is devoted to comparison between our non-asymptotic achievabilities and 



existing results in the literature, and the conclusion is drawn in Section IX 



II. Non-asymptotic Coding Theorems for Gallager Parity Check Ensemble 

In this section, we present non-asymptotic coding results for random linear codes of block 
length n based on Gallager parity check ensemble for any BIMC. 

Fix an arbitrary BIMC {p{y\x) : x e X,y E y} with X = {0, 1}. Denote its channel capacity 
by C'bimc and define its linear capacity as 

CBiMC-L = ln2-/7(X|r) 
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where X is a uniform input random variable, and Y is the corresponding output of the BIMC. 
(Here and throughout the rest of the paper, information quantities such as entropy, conditional 
entropy, mutual information, and divergence (or relative entropy) are measured in nats, and In 
stands for the logarithm with base e.) Let p(y) be the pmf or pdf (as the case may be) of Y, 
and p{x\y) the conditional pmf of X given by Y. It is easy to see that 

piy) = ^b(2/|o) 

and 

p{y\x) 



p(x\y) = 



p{y\0) + piy\l) 

Let €n,k be a linear code with block length n and parity check matrix H(^n-k)xn- Assuming 
codewords are ordered in some manner, we shall refer to the q-th codeword in €n,k as x'"{q). 
We say ll(^n-k)xn is randomly picked from Gallager parity check ensemble Hn,k if entries of 
H(n-fc)xn are independently and uniformly generated from X = {0,1}. Denote the ensemble 
of linear codes with their parity check matrices from Hn^k by C^nk^^- facilitate our subse- 
quent discussion, we also specify the encoding procedure (i.e. the mapping from messages to 
codewords) of C\f^^^: given H(„_jfc)xn> x'^{q) is the g-th vector in the null space of H(„_fc)xn 
by lexicographical order for Q < q < 2"~^""*^*^^("-'=)x"^ — 1. By convention, we assume that all 
messages are equally likely. With slight abuse of notation, we shall use q to represent both the 
uniformly distributed random message and its specific realization; its exact meaning, however, 
will be clear from the context. Note that all codes in C^^k^^ ^ave the channel coding rate greater 
than or equal to 1Z{C['^I!^^) =Mn2 (in nats). The decoding procedure (named as jar decoding) 
is then specified as follows: given the channel output y", the decoder forms the set (also called 
BIMC-L jar for convenience) 

J(y") = [x^^X^:--\n ^ ^}y2f} , MM < ^(^1^) + 4 ' (2-1) 



n nr=ib(yiio)+p(t/^ii)] 

declares an error if no codeword is inside J(y"), and pick an arbitrary codeword in J(y") to 
be the estimate of the transmitted codeword otherwise. (Note that the case when more than one 
codeword is inside J{%p) is considered a tie by the decoder, which is broken in an arbitrary 
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wa>r}) It is easy to verify that 



for any y"^. 
Further define 



|J(l/")| <e"(^(^l^)+'') (2.2) 



P& =Pr|-^^lnp(X,|Z,) >/J(X|r) + 5| (2.3) 

where X1X2 ■ ■ ■ X„ is an independently, identically and uniformly distributed sequence and 
ZiZ2 - ■ ■ Zn is the corresponding BIMC output. 
Puncture from the message space and ignoring its insignificant effect on the rate, we have 



the following non-asymptotic coding theorem, which is proved in Section IV 



Theorem 1. Given a BIMC with linear capacity Cbimc-l, let Pe{C^nk^^) denote the average 
word error probability (under jar decoding) ofC^^,^''^ with respect to the random message q, the 
BIMC, and the random linear code C^^^^ itself. Then for any block length n and 5 > 

Pe{C^^f) < ^ Ps + e-"(^BIMc-L-5-7^(ci^''''))^ (2.4) 



Remark 1. The key idea of the proof of Theorem [T] as shown in Section IV is to bound the 
error probability (under jar decoding) in two parts 

PeiCiT) < Pr{X"(g)^ J(n} 



Although this approach shares certain similarities with Gallager's proof technique illustrated in 
Section |I[ the key difference lies in that since all codewords inside the jar are treated equally, 
the second probability is handled by the union bound applied to all sequences inside J(F"), 

*This decoding rule is closely related to Feinstein's threshold decoding. The difference lies in that when more than one 
codeword is inside jar or passes the threshold, the jar decoder treats the case as a tie, which is arbitrarily broken, while the 
threshold decoder will select the codeword with the lowest index. The reason for us to call this decoding rule jar decoding 
instead of modified threshold decoding is three fold: (1) it leads us to a philosophically different way to handle the second 
probability in ^LQ, as discussed in Remark [T| and illustrated in the proof of Theorem [T] (2) it allows us to easily identify which 
probability in ijTTTJ is dominating, as discussed in Remark|4j and (3) by treating all codewords inside the jar equally, the decoder 
is not confined to solve any specific optimization problem, which, along with the flexibility of the formation of jar itself, we 
hope may lead one to look at practical decoding in a different way. 
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instead of all codewords other than X"(g). Therefore, no symmetry of channel is required in 
our proof. 

Remark 2. The purpose of puncturing q = from the message space is to make the proof a 
little bit simpler. From the proof in Section llVl it can be seen that if we add q = back, it 



only increases the error probability upper bound by 2^"^*^''"''' \ Moreover, when the channel 
has certain symmetry, i.e. — lnp(0|F) given X = and — lnp(l|F) given X = 1 share the same 
distribution (we call such a channel a binary input memoryless symmetric channel (BIMSC)), 
punctuation of zero message is not necessary and the term ( |2.4[ ) can be dropped. Note 



that the set of BIMSCs includes both MBIOS channels and weakly symmetric channels defined 
in [[8| as a special case, and in the case of BIMSC, Cbimsc = C'bimsc-l always holds. 

Remark 3. The proof technique of Theorem [1] can be also applied to Shannon random code 
ensemble (with uniform input distribution) and Elias generator ensemble [|9|, in which the 
generator matrices of linear codes are generated in the same way as that for parity check matrices 
in Gallager ensemble. In fact, the proof for those ensembles will be even simpler, and the term 
^_l-n in ( |2-4| ) can be dropped. 



As can be seen, the error probability bound in ( |2.4[ ) is in a parametric form with respect to 6. 
In other words, given the block length n and the channel coding rate 1Z{C^^I^^'') (or equivalently 



k), ([241) holds for any value of 5. And it is not hard to see that P5 and (.~<CmMc-i.-&-n{cff^)) 



are respectively decreasing and increasing functions of 5. Consequently, there is an optimal 5 



which minimizes (2.4). For some special channels such as BSCs and BECs, can be efficiently 



calculated for any 5, and therefore the optimization of ( |2.4[ ) with respect to 5 can be exactly 
solved. However, for other channels, especially those with continuous output (like BIAGCs), it 
is extremely difficult to directly evaluate P^. To overcome this problem, tight upper and lower 
bounds on P^ are established in Appendix |A} By combining these bounds on Ps with Theorem 
[TJ we then derive an achievability bound of an analytic form. Towards this, some definitions are 
needed. 

Let us temporarily drop the assumption that X is discrete and adopt the convention that J dx 
is interpreted as '^^x&x when X is discrete. Now given a random variable pair {X,Y) with 
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distribution p{x,y), let 

\*{X\Y) =sup <j A > : // p{y)p-^+\x\y)dxdy < oo 



Suppose that 

X*{X\Y) > 0. (2.5) 

Define for any 5 > 

rx\Y{S) =sup A {H{X\Y) + 6)-\n [[ p{y)p-''+\x\y)dxdy 

A>0 L J J 

and for A e [0,A*(X|r)) 

f\{x,y) - 



If Pi'v)p ^+^{u\v)dudv 
5(A) = jj p{x,y)Mx,y)[-\np{x\y)]dxdy - H{X\Y) 

aUX\Y,X) ^ jj h{xMy)p{Ay)\-^^P{Ay) - {H{X\Y) + 5{\))\'dxdy 

Mh{X\Y,\) ^ jj h{x,y)p{y)p{x\y)\-\Yip{x\y)-{H{X\Y) + 5{\))\^dxdy 

e.(x|r,A,.)^2^^^-(^'^'^) 



^a%{X\Y,\) 

n\'^cr%,{X\Y,\) ^ 

+ e ^ [Q{^\aH{X\Y, A)) - Q{p* + ^\aH{X\Y, A))] (2.6) 

where 

Q{s) = —= \ e-"" l^dx 



'2tx 

Q{p*) = "^^s^xjyiy' < C < 0.4784 is the universal constant in the Berry -Esseen central 



limit theorem [10|. Denote a|(X|y, 0) by (t|^(X|F) and Mh{X\Y, 0) by Mj^(X|F), and define 



A*(X\Y) = lim 5(A) 

AtA*(X|y) 

where the above limit exists as shown in Appendix |Aj Further assume that 

cTjj{X\Y) > and Mh{X\Y) < oo. (2.7) 

Now let X be the uniform input random variable to the BIMC, and Y the corresponding 
output random variable of the BIMC. Combining Theorem [T] with non-asymptotic bounds on Ps 
developed in Appendix |A[ we then get the following result, which is proved in Section |V} 
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Theorem 2. For any BIMC with a]j{X\Y) > 0, X*{X\Y) > 0, and Mh{X\Y) < oo and any 
block length n, the following hold: 
1) For any S E (0, A*(X|F)) 

Pe(4^;'^) < + ^) ^^(Xir, A,n)e— 1-(^) (2.8) 

whenever 

l<'[Cn,k ) ^ '-^BIMC-L - d - rx\Y[o) H (2.9) 



n 

where A = r^|y(5). 
2) For any real number c 



.(Gal), / 1 ^ / c \ , I I CMniXlY) , e -h(-i-) 

'n,fc 



n c Inn 2^x2 (yivi + [in \/27rcrg(X F)] 
/<-lL„,fc j < Obimc-l ^ ~ ■ (^-ii) 



Remark 4. As shown in the proof of Theorem |2] in Section V given the coding rate 7?.(C^*^"'^), 
the optimal S is yielded by making 



g-n(CBIMc-L-^+7^(ci°;'^)) _ 



and 



/n 

in part 1) and 2) of Theorem [2] respectively. In both cases, 

for the optimal 6 when TZ^C^^^''^) is close to Cbimc-l- On the contrary, in Gallager's error 
exponent analysis illustrated in the introduction section, yj^ was chosen such that the first and 
second probabilities share the same exponent, for the sake of the tightness of error exponent. 
This difference, coupled with the fact that non- asymptotic bounds on Ps in Appendix |A] is tighter 
than chemoff bound, explains why our achievability can be tighter than Error Exponent bound 
in the non- asymptotic regime. Another advantage of applying non-asymptotic bounds on Ps is 
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that we do not have to choose J(F") for the sake of easy computation of Ps, which explains 
why our achievability can be tighter than TSB on BIAGC. 



Remark 5. The inequalities ( |2.10[ ) and ( |2.1 1[ ) show that if the word error probability is kept 
slightly above 0.5, the code rate can be even slightly above the capacity of the BIMC with 
Cbimc = C'bimc-l ! Figure [T] shows the tradeoff between the word error probability and block 
length when the code rate is 0.21% above the capacity for the BSC with cross-over probability 
p = 0.12, where in Figure [1} both the capacity and code rate are expressed in terms of bits. As 
can be seen from Figure [T] at the block length 1000, the word error probability is around 0.65, 
and the code rate is 0.21% above the capacity! Although this phenomenon has been implied by 
the second order analysis of the coding rate as n goes to oo [[T|, pT|-[ 14 1 , the inequalities 
( |2.10[ ) and ( |2.11[ ) allow us to demonstrate this for specific values of n and for random linear 
codes based on Gallager parity check ensemble. 



error probability vs. block length when rate is above capacity 
p=0.12, Capacity=0.471, Rate=0.472 




400 600 
block length 



Fig. 1. Tradeoff between the word error probability and block length when the code rate is above the capacity with p — 0.12. 

Remark 6. Parts 1) and 2) of Theorem |2] both provide non- asymptotic achievability bounds on 
the error probability and coding rate of Gallager's ensemble, which begs a comparison between 
them. It turns out that given block length, either of those achievability bounds can be tighter than 
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10° 
10^' 



0." 10"^ 

10-^ 



'"0.36 0.38 0.4 0.42 0.44 0.46 0.48 

rate (bits per channel use) 

Fig. 2. Part 1) vs Part 2) of Theorem |2] on BIAGC with block length n = 1000 and snr=OdB 

the other for different coding rate regions. When the coding rate is above capacity, part 1) is not 
applicable, while part 2) can still bound the error probability strictly lower than 1, shown in the 
above discussion. However, when the coding rate is below capacity, part 1) will be tighter than 
part 2) as long as the coding rate is not too close to the channel capacity. A numeric comparison 
between part 1) and part 2) is shown in Figure |2] for BIAGC with block length 1000 and snr 
OdB, where the coding rate is kept less than the channel capacity ^ 0.4847 (bits per channel 
use). As can be seen, when the coding rate is moving away from the channel capacity, part 1) 
becomes much tighter. 

Although our focus in this paper is on non-asymptotic coding theorems, it is instructive to 
see how tight our achievability bounds in Theorem [2] are asymptotically as n goes to oo. Then 
we get the following asymptotic result, which is proved in Section [VT} 

Corollary 1. Given a BIMC with orj^{X\Y) > 0, \*{X\Y) > 0, and Mh{X\Y) < oo, let 
= ""^^"^^ Q^^i^n) for < e„ < 1. Suppose — = o(l) as n ^ +oo. Then we have 

ncf,f) > Cbimc-l -Sn-o {6n) (2.12) 

while Pe{Cf;'^) < en. 
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Remark 7. Given a BIMSC, results in [[T|, [|TT|-[|T6) imply that Cbimsc and — 5„ are the first 



and second order of the best coding rate that can be achieved by any code when the error 
probability is a constant or sub-exponentially decreasing with respect to n. Corollary [T] shows 
that the optimal first and second order coding performance can be achieved by Gallager ensemble 
under jar decoding as well. This in turn implies that the achievability bounds in Theorem [2] are 
asymptotically tight as n goes to oo with either a constant or sub-exponentially decreasing error 
probability with respect to n. 

111. Non-asymptotic Coding Theorems for Shannon Random Code Ensemble 

WITH A Fixed Codeword Type 

Consider now an arbitrary DIMC P = {p{y\x) : x E X,y E y}. Let X be the capacity 
achieving input random variable. Let Y be the output of the DIMC P in response to X. Then 
the capacity of the DIMC P is 

Cdimc = H-^i ^) • 

Now let us move away from linear codes in this section, and use random codes drawn from a 
particular type instead. Towards this, let us introduce some standard definitions involving types. 
Let V{X) represent the set of all probability distributions on X. For any t E V{X), t{x) denotes 
the probability of x under t. The set of types Vn{X) is the subset of V{X) such that t E Vn{X) 
if and only if t{x)n is an integer for any x E X. And for any t E Vn{X), let 7^" C A"" be the 
set of sequences with empirical distribution t. Define for any t E V{X) 

D{t,x)^ [ p{y\x)\n^^dy (3.1) 
I{t- P)=J2 ^(^) / P(y\^) In ^^dy = ^ t{x)D{t, x) (3.2) 

where 

xex 

Clearly, D{t, x) is the divergence or relative entropy between p{y\x) and qt{y)', and I{t; P) is the 
mutual information between the input and output of the DIMC P when the input is distributed 
according to t. In addition, it can be easily verified that 

/(t;P) = CDiMC + 0(n-2) (3.3) 
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whenever 

\\t-px\\i<— (3.4) 
n 

where px is the capacity-achieving distribution, i.e. the distribution of X maximizing /(X; Y), 



and II ■ 111 is the /i-norm. Obviously, types t satisfying p.4[ ) exist. 

Now let Ct^n,k denote the ensemble of channel codes from a type t with code length n and 
rate lZ{Ct^n,k) = ^lii2, where a channel code from Ct^n,k is generated in such way that each 
codeword is independently and uniformly picked from 7^". At the decoder, another version of 
jar decoding is used: given channel output y", the set J(y") is formed as 

Avl = Ix- e V : -1 f^ln^^ < -I{t- P) + 4 (3.5) 
[ nj^ qt{yi) J 

where 5 is a real number; then the decoder will declare an error if there is no codeword in 

J(?/") and pick an arbitrary codeword in J{y'^) to be the estimate of the transmitted codeword 

otherwise. (Note that once again, the case when more than one codeword is inside J(?/") is 



considered a tie, which is broken in an arbitrary way.) The set defined in p.5[ ) will be referred 
to as the DIMC jar based on type t. 
Define for any G 7^" 



X" = x" y (3.6) 



where F" is the DIMC response to the input X". Note that Pt s is well defined since the 
probability on the right hand side of p.6| ) depends on only through its type t. Then we have 
the following non-asymptotic coding theorem. 

Theorem 3. Given any DIMC P, let Pe{Ct,n,k) denote the average word error probability (under 
jar decoding) of Ct,n,k with respect to the DIMC and the random code Ct,n,k itself. Then for any 
block length n and 5 > 0, 

Pe{Ct,n^k) < Pt,s + e-(^(*^^)-^-^('^*-.^))+"^W-i"l'^"l. (3.7) 



Remark 8. It is easy to show that 



I ^ (n + l)W^ ^^-^^ 



and therefore 

ni7(t) -ln|7;"| < |A:'|ln(n + l). (3.9) 
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The term nH{t) — In |7^"|, instead of \X\ ln(n + 1), is kept in p.7| ) to make the bound slightly 
tighter for small n. 



Similar to ( |2.4| ) in Theorem [T] the achievability bound in ( |3.7[ ) in Theorem [3] holds for any 
S > given the codeword type and the coding rate, and therefore the tightest bound is yielded 
by further optimizing 5. When Pt s can not be efficiently calculated, an achievability bound of 
analytic form is needed. And once again, some definitions are in demand. 

Given a DIMC {p{y\x),x e X,y ey} and a distribution t G V{X), let 



X*_{t;P) ^supi A>0:5^t(a) f p{y\ 



dy < +00 > . 



(3.10) 



It is easy to see that X*_{t;P) depends on t only through its support, i.e. {x e X : t{x) ^ 0}. 
Suppose that 

X*_{t;P)>0. (3.11) 



Define any 5 >0 

r.{t,5) =sup|A(5-/(t;P))- Vt(a)ln / p{y\a) 
and for any A G [0, Al(t;P)) 



p{y\o) 
Qtiy) 



-A 



dy 



f-xAy\^) 



A 


p{y\x) 

_ qt{y) _ 


-A 


Jp{v\ 


x) 


p{v\x) 
_ Qt{v) _ 


-A 

dv 



D{t,x,X) = / p{y\x)f^xAyW) 



In 



5_(t,A) =$^t(x) /" p{y\x)f^,,t{y\^ 

Further define 



-In 



x&X 



piy\x)f-\,tiy\^) 
p{y\x)f^x,t{y\^] 



In 



In 



piy\x) 
Qtiy) 

p{y\xy 
Qtiy) . 

p{y\x) 
Qtiy) 

p{y\x) 
Qtiy) 



dy 

dy + I{t;P). 



D{t,x,X) 



D{t,x,X) 



dy 



dy 
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and 



2CMD,-{t;P,X) 



+ e" 



nX^cr'j^ _{t;P,\) 



[Q(v^Aor,5,_(t; P, A)) - Qip* + v^A(Tz3,-(t; P, A))] (3.12) 



with Q{p*) = ^J^^l^li^p^y Write al^,{t;P,0) simply as a^(t;P), Mz),-(t;P,0) as Mz5(t;P), 
crjj^px] P) as cr|)(X; F), and Mn{px] P) as Md{X] Y). It is not hard to see that 



it;P) = J2i{ 



and 



p{y\x) 



p{y\x) 



In 



In 



Qtiy) 

p{y\x) 
Qtiy) 



dy- [ p{,y\x) In 



p{y\x) 
qt{y) 



dy 



p{v\x) In 



p{v\x) 
Qtiv) 



dv 



dy 



For obvious reasons, aj^it; P) ((t|)(X;F), respectively) is referred to as the conditional diver- 
gence (or relative entrop3|^ variance of P given t (Y given X, respectively). 
Assume that 

al{t; P) > and Moit; P) < +oo. (3.13) 



One can verify that Condition p.l3[ ) depends on t only through its support; in other words, once 



Condition p.l3[ ) is valid for a distribution t eV, it is also valid for all distributions t eV with 
the same support as that of t. In addition, it is not hard to verify that 

(5_(t,0) = 



d6_{t,X) 
dX 



> 



p{y\x)f-\,tiy\^) 
Piy\x)f-x,tiy\^) -In 
p{y\x)f-x,t{y\^) 



-In 



p{y\x) 
(it{y) 



dy 



p{y\x) 
Qtiy) 



dy 



In 



p{y\x) 
Qtiy) 



dy - D^{t,x,X) 



§^2 



ajy{X; Y) coincides with channel dispersion defined in 
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where the last inequality is due to p.l3[ ). Therefore, A) as a function of A is strictly 
increasing over A G [0, X*_{t; P)). Let 

Aim = lim 5_(t,A). 

X-tX*_{t;P) 

It can be shown that r_(t,5) is strictly increasing, convex and continuously differentiable up to 
at least the third order inclusive over 6 E [0, A*_(t)), and furthermore r_(t, 5) has the following 
parametric expression 



.{t, <5_(t, A)) = A(5_(t, A) - lit; P)) - In / Pivl^) 



p{y\x) 
. Qtiy) _ 



dy (3.14) 



with 



A 



dr_{t,5) 

m 



satisfying 



6-{t, X)=6 . 



Then we get the following result, which can be proved in the same way as that for Theorem 
2] (where non-asymptotic bounds on Pt^s developed in Appendix |B] are used), and therefore the 
proof of which is omitted. 



Theorem 4. For any DIMC P and type t satisfying p.l 1[ ) and p.l3| ), the following hold for 
any block length n: 

1) For any 5 E (0, Al(t)) 

PeiCt,n,k) < (1 + X)^D,-{t; P, A, n)e-'^^-(*'^) 

whenever 

ln[AeD,-(t; P, A, n)] - n/7(t) + In |7;"| 



(3.15) 



7^(C^,„,,)</(^;P)-5-r_(^,5) + 

where A = ^"^'q^'^^ satisfying 5_(t, A) = 5. 
2) For any real number c 



n 



(3.16) 



Pe{Ct,n,k) < Q 



croit; P) 



n 



CMoit^P) e ^ 



(t;P) 



(3.17) 
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whenever 



c Inn 



2alit;P) 



+ In [V2^aDit; P)] + nH{t) - In |7;"| 



^/n 2n 



n 



(3.18) 



Remark 9. Comments similar to Remarks |4] to [6] immediately following Theorem |2] apply to 
Theorem |4] as well. 



Remark 10. It is not hard to show that in the case of BIMC 



aD{X-Y)<aH{X\Y) 



(3.19) 



and the inequality p.l9[ ) is strict in general unless the BIMC happens to be a BIMSC such as 
the BSC and BIAGC, in which case p.l9| ) is the equality. Therefore, by comparing Theorem |4] 
with Theorem [2[ we see that for a BIMC which is not a BIMSC, Shannon random codes with 
a fixed codeword type are generally slightly better than random linear codes in terms of the 
tradeoff between the coding rate and word error probability. In addition, since our bounds in 
Theorem |4] are valid for any n and t, one can further optimize the bounds in Theorem |4] over 
all input types satisfying p.l 1[ ) and p.l3| ). 



Given any DIMC P, fix a distribution p^, on X satisfying p.l 1[ ) and p.l3[ ). For any type 
t E Vn{X) having the same support as that of and satisfying 



\t — p 



and for any < e„ < 1, let 8t,n = ^^^Q-\en). In parallel with Corollary [l| we have the 
following asymptotic result, which can be proved in a similar manner, and therefore the proof 
of which is omitted. 



< 



n 



(3.20) 



Corollary 2. Suppose = o(l) as n +oo. Then we have 

n{Ct,n,k) > I{t; P) - 6t,n - 0{6t,n) 

and 

for any type t G Vn{X) having the same support as that of and satisfying p.20[ ) 



(3.21) 
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Remark 11. In our companion paper [14|, it is shown that I(t; P) and —6t,n are the first and 
second order of the best coding rate that can be achieved by any code with its codewords drawn 
from T" when the error probability is a constant or sub-exponentially decreasing with respect 
to n. Corollary |2] shows that the achievability bounds in Theorem |4] are asymptotically tight up 
to the second order as n goes to oo with either a constant or sub-exponentially decreasing error 
probability with respect to n. 

IV. Proof of Theorem [H 

Recall the encoding procedure of Cl^^''\ Let be the transmitted codeword, where q is 

uniformly distributed over the punctured message space with message deleted. Let be the 
output of the BIMC in response to X"(g). It is not hard to verify that for any 7^ G A"", 



PiW e C. 



(Gal) 



= x"} = 2-("-'=) = e-("-^)i°2 ^^^^ 
To proceed, according to the decoding procedure specified in Section |n[ we have 

Pe{Cf,f) < Pr{X"(g) ^ J(n} 

+ Pr|3zVX"(g),^" e J(F"),z" G Cf;'\x''{q) G J(r")} 
< Pr{X"(g) ^ J(r")} + Pr |3z" ^ G J(r"), G Cff^ (4.2) 

where J(y") is the BIMC-L jar for F". For any a;" G and G 3^", one can verify that 

Pr|3;2" ^ e J(y"),2" G Ci^/'^|x"(g) = = y"} 

= Pr I 3;z" ^ x", G J(?/"), G ^^^"'^ X"(g) = x'^, = y 



1) 
< 



2) 



Pr <^ G C, 



(GaO 
n,k 



z"£j{y"),z"^x" 



X"(g)=x"} 



(4.3) 



< |J(|/")|e-("-'=)i"2 

where the inequality 1) follows from the fact that given X"(g), F" and C^*^"'^ are conditionally 
independent, the inequality 2) is due to ( |4.1[ ), and finally the last inequality above is attributable 
to the upper bound on the size of the jar J(y") in ( |2.2| ). Since ( |4.3| ) is valid for any x" G A"" 
and I/" G 3^", it follows that 

Pr |3z" 7^ X"(g), 2" G J(r"), G C^^"'^} < e-"(^siMc-L-5-7e(ci«r')) _ ^^ ^^ 
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To continue, let X" = X1X2 ■ ■ ■ Xn be a random variable taking values uniformly over A"", 
et = Z1Z2 ■■■Znhe the out 
that for any G A'"/{0"}, 



Let = Z1Z2 ■ ■ ■ Zn be the output of the BIMC in response to X". For C^^^''\ one can verify 



9 — (n— fc)n 

Pr{X"(g)=xn = J] 



2(n-mnfc(H(^_fc)xn)) _ \ 

{n — k)xn'-^{n — k)xn^ 



r)—(n—k)n 

E 



2(n-ra?iA:(H(„_fc)x„)) _ 
o— A;)n 

V = 

2("-™"^(H(„-fe)xnK„xn)) _ I 

E 



o("-™"'=(H(„_fc)x„)) _ 1 
H' n' aj/n^o"-* 

= Pr{X"(g) =x'"} 

where K„xn is an invertible matrix such that = K.nxn.x''^- This implies that for C^^k^\ X"(g) 
takes all sequences G A'"/{0"} equally likely. Since the zero sequence is not allowed by way 
of puncturing, it follows that the distribution of X"(g) is the same as the conditional distribution 
of X" given X" 7^ 0". Therefore, we have 

Pr{X"(g) ^ J(y")} = Pr{X" ^ J(Z")| X" ^ 0"} 

< ^z^Pr{X"^J(Z")}. (4.5) 



Putting ( [4l| ) and ([44|)-([43]) together yields 

Pe{Cf^k'^) < P^^^" ^ ^(^")} + e""(^^™°-^"^"^^''"^'="'^0 (4.6) 
and the theorem is proved by observing that 

Pr{X" ^ J(Z")} = Pr lnp(X,|Z,) > i/(X|y) + = (4.7) 
due to the definition of the BIMC-L jar. 

V. Proof OF Theorem [2] 
Several tight non-asymptotic bounds on Ps (called non- asymptotic equipartition property with 



respect to conditional entropy) are developed in Appendix [A} The inequalities (2.8) to (2.11 1 
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can now be established from Theorem [T] by applying different upper bounds to Ps in Theorem 
[5] in Appendix [A} Towards proving part 1) of this theorem, by ( |A.3| ) in Theorem |5} 



P5<^H{X\Y,X,n)e- 



-nrx|y((5) 



where A = r^|y(5). In the meantime, whenever ( |2.9[ ) holds. 



n(CBiMC-L-'5-7^(ci'J") 



< A^H(X|F,A,n)e 



~nrjf|y{(5) 



(5.1) 



(5.2) 



Then ( |2.8[ ) is yielded by plugging ( |5.1| ) and ( |5.2[ ) into ( |2.4| ) in Theorem [T| The parametric form 
of PeiCl^f) and 7e(4^;')) in and ^ comes from the effort of optimizing 6. Indeed, 
upon applying ( |A.3| ) to Ps, the optimal 5 is given by minimizing 

where the term is dropped due to its numeric insignificance. Setting the derivative of 

above quantity with respect to 5 to zero results in 



ld^H{X\Y,X,n) dX 



n 



-XlH{X\Y,X,n) 



-nr^|y(5) _|_ g-n(CBiMC-L-'5-7^(C^ffc"'')) _ g (5 3) 



dX d6 

as A = 'r^|y(5). To simplify ( |5.3| ), ^ '^^H{x\Y,x,n) ^ ignored as the magnitude of this term is in 
general much smaller than A^^^ A, n) for reasonable values of n, and consequently, optimal 
5 can be approximated by solving ( |2.9[ ) or ( |5.2| ) with equality. 
To prove part 2), let 6 



and by (A.5 1, we have 



^-'^U^(X|F)y'^v^ aUX\Y) ' 



Meanwhile, 



< 



_g 2<T|j(x|y) 



(5.4) 



(5.5) 



W27rc7/f(X|F) 

whenever ( |2.1 1[ ) is valid. Then ( |2.10[ ) is proved by combining ( |5.4[ ), ( |5.5[ ) and ( |2.4[ ) in Theorem 
[l] Similarly, the parametric form of Pe(4?^^) and 7^(Ciy'^) in ( |2J0l ) and dTTT] ) is yielded by 
optimizing c to get the tightest bounds as the solution of c to ( |2.1 1[ ) or ( |5.5| ) with equality will 
minimize 

Q 



aH{X\Y) 



1 CMh{X\Y) , 



a|,(X|r) 



DRAFT 



22 



VI. Proof OF Corollary [T] 
When e„ = e remains a constant with respect to n, 

6n = 0{n-'-') (6.1) 

and ( |2.12[ ) can be easily proved by part 2) of Theorem [2j Now we focus on the case when 
en = o(l) and = o(l) as n — )■ +oo. In this case, it is easy to verify that 5„ = o(l) and 

5n = uj{n~^-^), which further implies that = o((5„). Let 



2 , ^1 



for some constants do, di > 0, and A = r^|y(5)- Now we would like to show that by choosing 
proper c/q and di, 

(^Y^ + ^) ^^(Xir, A,n)e-"^-i-(^") < en. (6.2) 



Towards this, 



(yZ^. + a) ^„{X\Y, a, nje—l-W 
§ (1 . A . 0(2-")) (e^^g(v^W(.Y|K. A)) + ^^^^f^) 

W . . . . 1 d^\ -n( -^ll—^diP 



27rV^AcT//(X|F,A) 



V 27r^/r^5r, 



2<T|j(x|y) 



'n 



(e) 1 ^7 Jfn 

J V"'»n 2<T2j(X|y) 



2 ^i^^j"^'" '"^ 



where (a) is due to the definition of ^ni^lY, X,n); (b) follows ( |A.2| ) and the fact that 

^/ X 1 _££ 
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and ^"l^x\Y,\) ^ function of A is bounded in a small neighborhood of A = 0; (c) can be 
verified by 



and 



1 ds 



27ry/^XaH{X\Y,X) y/n 
aH{X\Y) / Sr^ 



^ aH{X\Y) ( 6^ 

- V2^^sA'^^hix\Y){i-oCx)) ^"^ 



27: ^/nSn yXajjiXlY) 
aH{X\Y) f 5^ 



aH{X\Y) f 5n 



2a%{X\Y) 2aUX\Y) 

- 2<tI{X\Y) ""'^^ 
for some constant de > 0; (d) is due to the inequality > 1 + x and nS^ — uj{l); (e) is valid 
by choosing 

do^aUX\Y)(d2 + d5 + de) 

and 

d,^a%{X\Y); 

(f) follows the inequality 

1 X . , 
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and (g) is due to the definition of Now by part 1) of Theorem [2| 

l<'[Cn,k ) > <^BIMC - - rx\Y[o) + 



n 



r R n(x^\ nl ^ \ Inn lny^A^//(X|r,A,n) 

L^BIMC — On — '-^[On) ~ ^ ^TT^ r 



n^6'^ J 2n n 



(1 \ 111 77 

+ (6-4) 

where the last step is due to Proposition [T] in Appendix |Aj And the proof of this corollary is 
completed by observing that 



o{6n). 



VII. Proof of Theorem[3] 
The proof is along the same way as in the proof of Theorem [T| Let X"(g) be the transmitted 



codeword, and the output of the DIMC P in response to In parallel with ( |4.2| ), we 

have 

PeiCt,n,k) < Pr{X"(g)^J(r")} + Pr{3^V^"(g),^"e J(F"),^"GCt,„,fe} (7.1) 

where J(F") is the DIMC jar based on type t as defined in p.5[ ). Note that X"(g) is uniformly 
distributed over 7^". For any E 7^" and G 3^", one can verify that 

Pr {3z" 7^ X^{q),z^ G J(F"),^" G Ci,„,fc|X"(g) = = y"} 

< \J{yn\\%T'^' 

< |J(y")|e'=i°2-in|r,"| 

< gn[I^W-/{t;P)+5]gn[|ln2]-ln|r,"| 

^ g-n[/(t;P)-5--7e(Ci,„,fc)]+nJ/(t)-ln|ri"| 2) 
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where (a) is due to the fact that all codewords in Ct^n,k are independent, and each is distributed 
uniformly over 7^", and (b) is verified by 



) 



z"eJ{y") 



X] 



< 



Sx^eA'" ni=l ^iXi)p{yi\^i) 



Sx^eA"" ni=l ^i^i)piyi\^i 



< 1 



since for any G 7^" 



i=l 



^nH{t) 



and 7^" is only a subset of A"". Since (7.2) is valid for any x" G 7^" and G 3^", it follows 



that 

Pr {3^" 7^ G J(r"), 2" G < e-"[^(*'^)-^-^('^'.".'=)]+"^W-''^|7'"l . (7.3) 

The proof of this theorem is completed by observing that 

Pr{X"(g) J(r")} = Pi,5 (7.4) 

as is drawn from 7^". 

VIII. Comparison with Existing Non-Asymptotic Achievability 



Although there are tremendous achievable bounds p7| , p8| (and references therein) on 
channel coding rate in the prosperous literature of information theory, where various code 
ensembles and bounding techniques are used, it does not seem that any of our random coding 
theorems (Theorems [Tj |2} [3} and |4]) could be implied by existing achievability bounds in the 
literature because of either the generality of our channel models or the special structure of our 
random code ensembles in our random coding theorems. For example. Theorems [T] and [2] are 
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concerned with Gallager parity check ensemble, wherein codewords are not necessarily pairwise 
independent, and applicable to any binary input memoryless channel without any symmetry 
constraint whatsoever. On the other hand, most achievability bounds on linear block codes 



are for binary input memoryless channels with symmetry [17|. Nonetheless, it is instructive 
to compare our achievability bounds in Theorems [TJ [2} [3} and |4] with existing bounds in the 
literature whenever possible. Below we will compare our achievability bounds in Theorems [T] 
and [2] with existing bounds on random linear code ensembles for channels with symmetry, and 
our achievability bounds in Theorems [3j and |4] with existing bounds on the existence of codes 
with a fixed type. 

A. Achievability on Random Linear Code Ensembles 

Random linear code ensembles include Elias generator ensemble and Gallager parity check 
ensemble. While codewords generated in Elias ensemble are pairwise independent, it is not true 
for Gallager ensemble. Consequently, non-asymptotic coding theorems on Shannon random code 
ensemble in the literature, whose proof relies on pairwise independence of codewords, apply only 
to Elias ensemble, but not to Gallager ensemble. Here we focus on those achievabilities applicable 
to random linear code ensembles, with the emphasis on Gallager ensemble. Furthermore, as some 
achievability bounds are only applicable to special channels, we divide our discussion into four 
parts: 1) bounds for BSCs; 2) bounds for BECs; 3) bounds for BIAGCs; and 4) bounds for 
MBIOS channels. 

1 ) BSC: To make comparison transparent, we rewrite Theorem [T| Let M = 2^ be the number 
of codewords, and p G (0, 0.5) be the crossover probability. By ( |2.4[ ) in Theorem [T] and Remark 
|2} it is not hard to verify that 




Pr{X"^J(y")} 

Further optimizing 5 implies that 



w=0 




Pe(4^fc"'^) < min{p-(l-p)"-,2-"M} (8.2) 
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and ( |8.2[ ) is essentially the same (except for a minor differencej^]) as the Dependence Testing 
Bound recently established in [[TJ Theorem 34] for Shannon random code ensemble and Elias 
ensemble over the BSC. 

As discussed in the introduction section, Poltyrev derived an achievability bound for any 
deterministic code in terms of its Hamming weight profile {A^(/)}"^^ on BSCs, and by replacing 
A^(/) with 2"*^""'^) ("), the resulting bound holds for Gallager ensemble c'^l!^\ as well as Elias 
ensemble. In addition, it was shown that Random Coding Union Bound [[1] Theorem 33] derived 
for Shannon random code ensemble and Elias ensemble is the same as Poltyrev's bound. 











- Jar Decoding 

- - Dependence Testing 

- - Poltyrev {Random Coding Union) 
' — ' Error Exponent 



1000 1500 2000 

block length 




- Jar Decoding 

- - Dependence Testing 

- - Poltyrev {Random Coding Union) 
' — ' Error Exponent 



1000 1500 2000 

block length 



(a) Pe = 10"^ 

Fig. 3. Comparison of Achievability for BSC with cross-over probability p = 0.11 



(b) Pe = 10" 



Figure |3] shows the numeric comparison (with block length range [200,3000] and fixed word 
error probability 10~^ and 10^^) among Theorem [IJ Poltyrev's Bound [|6[ Lemma 1] (Random 
Coding Union Bound [fT| Theorem 33]) and Error Exponent Bound on a BSC with cross-over 
probability p = 0.11, where Dependence Testing Bound [[T| Theorem 34] is also included for 
a benchmark. As can be seen, the numeric result confirms that Theorem [T] is essentially the 
same as Dependence Testing Bound and further shows that Poltyrev's Bound (Random Coding 
Union Bound) is better than Dependence Testing Bound and Theorem [T] by a small margin, 
while Dependence Testing Bound and Theorem [T] outperform Error Exponent Bound when word 

^Replacing M in ( |8.2^ by (Af — l)/2 yields exactly the Dependence Testing Bound jlj Theorem 34]. 
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error probability is relatively large with respect to block length, which is consistent with the 
observation in [|T|. 

2) BEC: Now let us focus on a BEC. In this case, Theorem [T] can be further improved as 
follows. Let M = 2^^ be the number of codewords and p be the erasure probability. It is then 
easy to verify that 

H{X\Y) =p\n2 



and in this case, the BIMC-L jar reduces to 

{x" : Xi = yi if yi ^ e} if \ {i : yi = e} \ < n [p + j^) 
empty otherwise 
Following the argument in the proof of Theorem [T| we have 



Pr{X"(g)^J(y")} 



+ Pr |3z" ^ X"(g), G J(F"),z" G Cg"^^} 



+ E I I p*(l-p)"-*2*2-"M (8.3) 



l<t<n(p+j/2) 

and optimizing 5 yields 



t=i 



= E 1^ j -p)"~*2-["-*-i°S2A/]+ (8 4) 

which is again essentially the same (except for a minor difference[i[|) as the Dependence Testing 
Bound [[T| Theorem 37] for Shannon random code ensemble and Elias generator ensemble. Note 
that in Theorem [Tj is dropped here according to Remark |2j 



"Replacing M by (M — l)/2, and then starting tiie summation from t — instead of f = 1 in \%A\ yield exactly the 
Dependence Testing Bound |1. Theorem 37]. 
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For BECs, Ashikmin derived an expression for word error probability of full rank Elias 
ensemble (i.e. the generator matrix is equiprobably selected among all full rank matrices), 
included as Theorem 6 in [[T|. Figure |4] shows the numeric comparison among (8.4), Ashikmin's 
Bound, Error Exponent Bound, and Dependence Testing Bound [[Ij Theorem 37]. Once again, our 
achievability is very close to Dependence Testing Bound, outperforms Error Exponent Bound, 
and is worse than Ashikmin's Bound (the best achievability under ML decoding known so far) 
by a small margin. 













Ashikhmin 




— Dependence Testing 




— Jar Decoding 




Error Exponent 




- - Ashikhmin 

— Dependence Testing 
— Jar Decoding 

■ Error Exponent 



block length block length 

(a) Pe = 10"^ (b) = 10"' 

Fig. 4. Comparison of Achievability for BEC with erasure probabihty p = 0.5 



3) BIAGC: Since in this case, there is no feasible way to calculate Ps, we apply part 1) of 
Theorem [2| where ^_l-n in ( |2.8[ ) is replaced by 1 due to Remark [2] 

There is a rich literature about error probability bounds of linear codes for BIAGCs. One 
of the tightest bounds in this research area is TSB, proved by Poltyrev in [j6|. TSB was then 
improved by Yousefi and Khandani in [19], and Mehrabian and Yousefi in pO| . It is unclear, 
however, whether those two improved bounds can be efficiently evaluated for Gallager parity 
check ensemble. Although TSB is one of the tightest bounds for any deterministic code in terms 
of its Hamming weight profile, it fails to reproduce the Gallager error exponent ( [17] and 
references therein ) for Gallager parity check ensemble. Figure |5] shows numerical comparison 
among part 1) of Theorem [2] ( ( |2.8[ ) and ( |2.9[ ) ), TSB, and Error Exponent Bound, where the 
signal-to-noise ratio (snr) is OdB and the word error probability is kept to be 10^^. As can be 
seen, TSB is worse than Error Exponent Bound, while our achievability is better than Error 
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Exponent Bound in certain block length region. To the best of our knowledge, this is the first 
numeric demonstration that Error Exponent Bound can be beaten in the non-asymptotic regime 
for BIAGCs as well. 





Jar Decoding 
Error Exponent 
TSB 



block length 



Fig. 5. Comparison of Achievability for BIAGC with snr OdB and word error probability Pe = 10 

4) General MBIOS Channels: The only existing achievability bound in the literature ap- 
plicable to this general case is Error Exponent Bound for Gallager ensemble, as well as Elias 
ensemble. The symmetry property of MBIOS channels is essential to the proof of Error Exponent 
Bound for Gallager ensembles. As demonstrated already, our achievability bounds in Theorems [T] 
and[2[ applicable to any BIMC, can be tighter than Error Exponent Bound in the non- asymptotic 
regime. 

5) Summary: Applicability (to ensembles and channels) and computational complexity of jar 
decoding achievability and existing achievability bounds for random linear code ensembles in 
the literature are summarized in Table |l[ where by unknown, we means that at this point we are 
not aware of any method which can be used to effectively compute the corresponding bound. 
Among all the listed results. Theorem |2] is the only achievability that can be applied to general 
BIMCs and efficiently evaluated. Focusing on Gallager ensemble, existing achievability bounds 
only deal with MBIOS channels, which are a strict subset of BIMCs. For some special MBIOS 
channels, e.g. BSCs and BECs, there are bounds proved under ML decoding, which are better 
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than our achievability in ( |8.2[ ) and ( |8.4| ) by a small margin in the non-asymptotic regime. For 
general MBIOS channels, however, to the best of our knowledge. Error Exponent Bound was the 
best computable achievability result in the literature before this paper. And numerical calculation 
shows that the achievability bound in Theorem [2] can be tighter than Error Exponent Bound in 
the non-asymptotic regime. 



Achievability Bounds 


Applicability 


Computational 
Complexity 


Linear Code Ensembles 


BIMC 


Jar Decoding 




^/ Elias ^/ Gallager 


BSC 


0(n) 




BEC 


0(n) 


Theorem 2 


General 


0(1) 


Poltyrev 




6 Lemma 1] 


Elias ^ Gallager 


BSC 


0(n) 


Ashikmin 


— 1 

i 


r 

; , Theorem 6] 


^ Elias (full rank) x Gallager 


BEC 


0{n') 


TSB 




6 Lemma 4] 


Elias ^ Gallager 


BIAGC 


0(1) 


Error Exponent 


5 


Elias Gallager 


MBIOS 


0(1) 


Random Coding Union 


1 


Theorem 33] 


^ Elias X Gallager 


BSC 


0(n) 


1 


Theorem 16] 


General 


Unknown 


Dependence Testing 


1 


Theorem 34] 


^^ Elias X Gallager 


BSC 


0(n) 


1 


Theorem 37] 


BEC 


0{n) 


1 


Theorem 17] 


General 


Unknown 



TABLE I 

Achievability bounds of Random Linear Codes for BIMCs 



B. Achievability on Shannon Random Code Ensemble With a Fixed Codeword Type 

Technically speaking, when channel input is discrete, achievability results for Shannon random 
code ensemble also apply to the code ensemble with a fixed codeword type t, by restricting the 
input distribution in 7^". In this case, however, neither the input nor output distribution has 
the product form. Consequently, the evaluation of those achievability bounds becomes much 
more challenging. In contrast, our achievability in Theorem |3] can be always easily computed 
for DIMCs with discrete output, while Theorem |4] can be used when the channel output is 
continuous. Therefore, in this subsection, we focus on those achievability bounds on random 
code ensemble with a fixed codeword type, which allow efficient evaluation. 
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Reviewing results in the literature, a connection between Theorem |3] and k/3 bound [1. Theorem 
25] is found. Towards showing this connection, the following definitions are needed. Let 
and g2(w"), G W", be two distributions on a sample space W", and pziw^izlw"^) be a 
distribution over z E {0,1} given any G W". Define for a E [0, 1] 

/3a(9i,92)= ^ min / g2(w")pi|u/"(l|w")rfw". (8.5) 

In hypothesis testing, the conditional distribution p*z\wn achieving the above optimization can be 
interpreted as an optimal randomized test between qi (null) and q2 (alternative). Now given any 
distribution qy^iy"') over ?/" G y" and conditional distribution PY^\X"=x"{y^) =PY"\X"{y"'\x"') 
over y'" E 3^" given any G X", further define for a E [0, 1] 

/3a(a;",gyn) =/3a(pyn|X"=a;", ?y")- (8.6) 

In addition, for T C A*" and r G [0, 1], define 

Kr{T,qY^)= inf f qYr.{y'')pz\Yr^W)dy^. (8.7) 

Then the following result is proved in [[T|. 

Result 1 {k(3 Bound [1, Theorem 25]). Given any channel {py"|x"(y"|a^") : ^ '^^^y G 3^"} 
and T C A"", f/zere exists a channel code Cn with M codewords, all of which are from T, 
satisfying 

M> sup sup ^ '^^/ ^. (8.8) 

0<r<Pe(C„) <jyn sup /^i_p4c„)+r(3^", ^Y") 

In general, /3 and k defined above are difficult to evaluate. Upper and lower bounds on (3 and 
K are provided in [[T] Equations (103), (104), (106), (121) and (122)], and included here for easy 
reference: 

/9a(gi,g2) < ^ (8.9) 

sup 7 

where follows the distribution gi, 

gy.) > sup 1 (a - Pr > (g.iQ) 
7>o 7 V I gyH^") J / 

where follows the distribution pYn\x"=x^ given x", and 

■/ j9x"(a;")rfx" < K^(J^,gyn) < r (8.11) 



T 
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when gyn satisfies 



Px"(a;")pyn|x"(2/"|a;")da;". 



Now let us compare Theorem |3] and Resuk [T] Strictly speaking, Result [T] is not applicable to 
Shannon random code ensemble with a fixed codeword type, as its proof constructs a channel 
code in a greedy, deterministic way. Nevertheless, both Theorem |3] and Result [T] imply the 
existence of channel codes with certain property and performance. Specifically, give a type t, 
let T = Tf^ and qy^iv'^) = qtiv^) = IYi=i Qtiy-i) ■ It is then easy to verify that l3a{x'^,qt) is a 
constant (denoted by (3a{qt) ) depending on ,t" G F only through its type t. Consequently, the 



bound (8.81) reduces to 



M > 



sup 

0<r<P,(Ct,„,fc) Pl-Pe(Ct,„,fe)+rl'?tj 



f^r{V',qt) 



From ( [8JT] ) and ([S^, it follows that 



(8.12) 



(8.13) 



and V,t" g TT' 



/3i-Pe(c,,„,fe)+r(a;",gt) 



> sup <! 7 : Pr <! >l}>l- Pe{Ct^n,k) + r 



t,n,fcj ~ T 



sup e 

5:Pt,a<Pe{Ct.„,fc)-r 



I{t;P)-5 



(8.14) 



where F" is the channel response to x". Now plugging (8.13) and (8.14) into (8.12), taking 
logarithm and then dividing n on both sides, we get 



> 



sup 



sup 



Iit;P)-S 



Inr + lne-"-^W|7;"| 



n 



0<T<Pe{Ct^„^k)S:Pt,S<Pe{Ct,„,t:)- 

^, ^, , lnr + lne-"^«|7;" 
sup sup I{t;P)-6-\ — 

S:Pt,s<Pe{Ct,n,k) 0<T<P4Ct,„,k)-Pt,S ^ 



sup I{t; P)-6 

S:Pt,S<Pe{Ct.n,k) 



ln(Pe(C, 



t,n,k) 



Pt,5) +lne-"^W|7;"| 



n 



(8.15) 



which is equivalent to p.7[ ) in Theorem |3j Consequently, both Result [T] and Theorem |3] imply 
the existence of a channel code with a fixed codeword t achieving the trade-off between the rate 



and the word error probability in p.7[ ). And both of the results go beyond this existence in their 
own ways. Result [T] holds for maximal error probability, and the achievability (8.8) might be 
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tighter than p.7[ ) in general, although the evaluation of (3 and k is quite challenging. Theorem 
|3| on the other hand, shows that the average coding performance (the rate and the word error 



probability) of random code ensemble with a fixed codeword type can achieve ( |3.7[ ), which 
implies the existence result, but not vice versa. 

Next, we move on to the error exponent result, proved by Fano in [|7| on any discrete (input 
and output) memoryless channel (DMC). Particularly, Fano showed that given a DMC and a 
type t, the error exponent achieved by Shannon random code ensemble with a fixed codeword 
type t is larger than that achieved by Shannon random code ensemble with input distribution t 
in general. Towards numeric comparison between Fano's result and Theorem [3} we consider a 
special DIMC with discrete output, Z channel, shown in Figure [6j As can be seen, Z channel 




Fig. 6. Z Channel 



and BEC share some common properties. Consequently, the achievability in Theorem [3] can be 
further improved by providing a better bound on the size of jar | given a channel output 

y". Given a type t, the improved achievability is shown below 




where M = 2"'^('^*.".'=) and 



1 — p)"^ min < 



1,(M-1) 





(8.16) 



m 



t{0)n. Then (8.16) (Jar Decoding) is numerically compared 



with Fano's result on Z channel with different channel parameters p and input types t, where 
Gallager's Error Exponent Bound on Shannon random code ensemble with input distributions 
corresponding to t serves as a benchmark. 

As shown in Figures |7] and [8j Theorem [3] constantly outperforms Fano's error exponent 
result. In addition. Figure [v] shows that due to the non-exponential term [l + e"^*^*^|7^"|~^] 
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block length 




block length 



(a) f = (0.5,0.5) (b) t = (0.1,0.9) 

Fig. 7. Comparison of Achievability for Z Channel with p — 0.5 and Pe = 10"'^ 




block length 

(a) t^px 

Fig. 8. Comparison of AchievabiUty for Z Channel with p = 0.9 and Pe = 10"'^ 




Qj- e"^(*)|7^"| Fano's result could be worse than Gallager's, despite the relation of Fano's 
and Gallager's error exponent functions. Meanwhile, in Figure [8} px represents the capacity 



achieving type, while t* is some type calculated in a way specified in [14|. A close look at 
Figure [8] then reveals that curves in (b) are above their counterparts in (a), which suggests that 
a capacity achieving input type or distribution is not necessarily optimal in the non- asymptotic 
regime. 



"In 7 , e"-^(''|7;"r^ is further bounded by (27rn)'-^lel'^l/i2 
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IX. Conclusion 

New non- asymptotic achievability bounds for random structured code ensembles, specifically 
GaUager parity check ensemble and Shannon random code ensemble with a fixed codeword type, 
have been derived for discrete input arbitrary output channels. These bounds are asymptotically 
tight up to the second order of the coding rate as the block length n goes to infinity with 
either constant or sub-exponentially decreasing error probability e. When combined with non- 
asymptotic equipartition property (NEP) developed in this paper, they are also easy to compute 
for any discrete input arbitrary output channel. Numeric evaluation has demonstrated that our 
achievability bound on GaUager parity check ensemble is the tightest achievability result known 
so far in some non-asymptotic regime for binary input additive Gaussian channels. A key step 
in establishing these new bounds is the introduction of a decoding rule called jar decoding, 
which has led us to apply the union bound with respect to sequences inside a jar, instead of all 
codewords inside a codebook. The concept of jar decoding and its related bounding techniques, 
along with NEP, may be useful to non-asymptotical analysis of other problems in information 
theory as well. 

Appendix A 

NON-ASYMPTOTIC EQUIPARTITION PROPERTY VV^ITH RESPECT TO CONDITIONAL ENTROPY 

In this appendix, we establish tight upper and lower bounds on Ps. In light of the asymptotic 
equipartition property (AEP) in the sense of the convergence of — - lnp(X'^|F") to H(X\Y) 



as 72 —J- oo in probability, these bounds (i.e., in ( |A.3| )) will be referred to, with a slight abuse 
of the term "equipartition", as the non- asymptotic equipartition property (NEP) with respect to 
conditional entropy. 

Theorem 5 (NEP With Respect to H{X\Y)). For any positive integer n, 

Pr {-- lnp(X"|F") > H{X\Y) + 5) < e-"''^!^^'^) (A.l) 



n 

where X" = XiX2---X„, y" = YiY2- ■ - Yn, and {Xi^Yi), i = 1,2, ••• ,n, are independent 
and identically distributed with p{x,y) . Moreover, under the assumptions (2^1 and ( |2.7[ ), the 
following also hold: 
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(a) There exists a 5* > such that for any 5 G (0, 5*], 

(b) For any 5 G (0, A*(X|F)) and any positive integer n 

e^(X|r,A,n)e-"^^i^('^) < Pr |-^lnp(F"|X") > i7(X|y) + 

< e^^(X|y,A,n)e-™'-i^(^) (A.3) 



where X = ?"x|y('^) ^ 0, ^//(X|F, A,n) defined in ( |2.6[ ), anJ 



iA2CT|^{X|y,A) 



e^(X|y,A,n) = e ^ g(p, + v^AaH(X|F,A)) (A.4) 



with Oio^-^-- 



(c) For any 6 < where c < aH{X\Y) is a constant, 

\aH{X\Y)J v^(T^(X|r) t ^ 

< c( ] I ^^^(-^1^) (A5) 



Proof: The inequality ( |A.1[ ) follows from the Chernoff bound. To see this is indeed the 
case, note that 

Pr I -- lnp(X"|r") > H(X\Y) + 5 
I n 

= Pr{-lnp(X"|F") > n(if(X|r) + 5)} 

- gnA(//(X|y)+5) 

= inf e-"[^(^^(^l^)+'5)-inEb-^(Xi|yi)]] 

A>0 
A>0 

= e-""^i^('') . (A.6) 



To show ( |A.2| ), we first analyze the property of rx|y(5) as a function of 5 over the region 6 > 0. 



Using a similar argument as in pTj Properties 1 to 3], it is not hard to show that under the 



assumption ( |2.5[ ), (5(A) as a function of A is continuously differentiable up to any order over 
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A G (0, \*{X\Y)). Taking the first order derivative of (5(A) yields 

p{y)p~^+'^{x\y) 



6'{X) 



[II p{v)p~-''~^^{u\v)dudv~\ 



[II p{v)p '^^^{u\v)dudv~\ 



— \np{x\y)] dxdy 



\np{x\y)] dxdy 



> 



(A.7) 



where the last inequality is due to ( |2.7| ). It is also easy to see that 5(0) = and 5'{0) = a'jj{X\Y). 
Therefore, (5(A) is strictly increasing over A G [0, A*(X|F)). On the other hand, it is not hard to 
verify that under the assumption ( |2.5| ), the function \{H(X\Y) + 6) — \n J J p{y)p~^^^{x\y)dxdy 
as a function of A is continuously differentiable over A G [0, A*(X|F)) with its derivative equal 
to 

5 - 5{\) . (A.8) 

To continue, we distinguish between two cases: (1) \*{X\Y) = oo, and (2) A*(X|F) < oo. In 
case (1), since (5(A) is strictly increasing over A G [0, oo), it follows that for any 6 = 6{\) for 
some A G [0,A*(X|F)), the supremum in the definition of rx|y((5) is actually achieved at that 
particular A, i.e., 

rx\Y{S{X)) = XiH{X\Y) + 6{X)) -\njj p{y)p-^+\x\y)dxdy . (A.9) 
In case (2), we have that for any 6 = (5(A) for some A G [0, A*(X|F)) , 

l3{H{X\Y)+5{\))-\n jj p{y)p-^+^{x\y)dxdy < \{H{X\Y)+5{\))-\n jj p{y)p~^+\x\y)dxdy 

(A. 10) 



for any (3 G [0,A*(X|r)) with /3 A. In view of the definition of A*(X|r), (jA^TOj) remains 



valid for any (3 > A*(X|F) since then the left side of ( |A.10[ ) is — oo. What remains to check is 
when /3 = A*(X|r). If 

r r 

p{y)p''^*^^^^'>^^{x\y)dxdy = oo 



it is easy to see that ( |A.10[ ) holds as well when j3 = A*(X|F). Suppose now 

p{y)p'^''^^^^^^\x\y)dxdy < oo . 
In this case, it follows from the dominated convergence theorem that 



lim 

/3tA*(X|y) 



p{y)p ^^^{x\y)dxdy 



p{y)p 



-A-(x|y)+i, 



x\y)dxdy 
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and hence by letting /3 go to X*{X\Y) from the left, we see that ( |A.10[ ) holds as well when 
/3 = X*{X\Y). Putting all cases together, we always have that for any 6 = 5{X) for some 
Ae [0,A*(X|F)), 



rxwrn)) = KH{X\Y) + 5(A)) - In / / p{y)p-'-^\x\y)dxdy 



(A.ll) 



Let 



A*{X\Y) 



lim (5(A) . 

AtA*(X|Y) 



Since both 5(A) and In/J p{y)p^^^^ {x\y)dxdy are continuously differentiable with respect to 
A G (0, A*(X|F)) up to any order, it follows from ( |A.l 1| ) that rx|y(5) is also continuously 
differentiable with respect to 5 G (0, A*(X|F)) up to any order. (At 5 = 0, rx|y(5) is contin- 
uously differentiable up to at least the third order inclusive.) Taking the first and second order 
derivatives of rx|y(5) with respect to 5, we have 



' X\Y 



(5) 



drx\Y{^) 
d6 

drx\Ym))dX 



dX 



d6 



drxwm)) 1 
dX 6'{X) 



6'{X) 
X 



H{X\Y) + 5iX) + X5'iX)- JJ 



p{y)p {x\y) 



[JJp{v)p- 



A+l| 



U\V 



)dudv~\ 



lnp{x 



dxdy 
(A.12) 



and 



X\Y 



dX 

dS 



6'iX) 



(A.13) 



where 5 = 5(A). Therefore, rx|y(5) is convex, strictly increasing, and continuously differentiable 
up to at least the third order (inclusive) over 5 G [0, A*{X\Y)). Note that from ( |A.12| ) and ( |A.13| ), 
we have r^|^(0) = and r^|^(0) = l/cr|^(X|F). Expanding rx|y(5) at 5 = by the Taylor 
expansion, we then have that there exists a 5* > such that 

1 



rx\Y{S) 



2aUX\Y) 



6' + 0(5^) 



(A. 14) 



for 5 G (0, 5* 
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Now towards proving parts b) and c) of this theorem, by (|A.l it is not hard to verify that 



Pr <; -- lnp(X"|r") > H(X\Y) + 6 
n 



p{x'',y'')dx''dy'' 

■^lnp{x"\y")>H{X\Y)+5 

fx\x\ y^fxix'^, ynp{^\ y^)dx^dy- 

■l\np{x^\y-^)>H(X\Y)+5 
■^\np{x^\y-^)>H(X\Y)+5 

g-n[-lAln,(x"|,")-A(/f(X|y)+5)+.,|,(5)] f^^^n^ y-)p{x^ ^ y-)dx'^dy^ 
\iip{x^\y^)>H(X\Y)+5 

~^\np{x^\y^)>H{X\Y)+S 

e— xirW // ^-^nXaH{X\Y,X) f^{x''^y^)p{x^^y^)dx''dy'' 

-^\np{x^\y^)>H(X\Y)+5 



p>0 - lnp{x^\y'^)-n{H(X\Y) + S) ^ 
^a„{X\Y,X) P 

+ 00 



+00 



,-nrx|y(<5) 







(A.15) 



where the last equality is due to integration by parts, 



) V -^MX^\Y^-iHiX\Y) + 5) 



1=1 



and {(Xj, are IID random variable pairs with pmf or pdf (as the case may be) f\{x, y)pix, y). 
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Let 



+00 







(A.16) 



+00 



J v^AaH(X|F,A)e-^"'^«(^l^'")^[F„(0)-F„(p)]t;p. (A.17) 



At this point, we invoke the following central limit theorem of Berry and Esseen [22 Theorem 
1.2]. 

Lemma 1. Let Vi, V2, ■ ■ ■ be independent real random variables with zero means and finite third 
moments, and set 

n 
i=l 

Then there exists a universal constant C < 1 such that for any n>l, 



sup 

'00<t<+00 



i=l 



1=1 



Towards evaluating we can bound F„(p) in terms of Q{p), by applying Lemma [T] to 

{- \np{Xi\Yi) - {H{X\Y) + Then for p > 0, we have 

CMh{X\Y,X) 



F„(0) < g(o) 



^af,iX\Y,X) 



and 



^n(0)-Fn(p) > 



1 CMHiX\Y,X) 

2 + V^a|,(X|r,A) 

- r cmh{x\y,x) 

CMh{X\Y,X) 



(A. 18) 
(A.19) 



Q(o) 
1 



Q{p) 



^aUX\Y,X) 

2CMh{X\Y,X) 



Qip) 



CMHiX\Y,X) 
V^aUX\Y,X) 



naUX\Y,X)\ 



(A.20) 
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where = max{x,0}. Now plugging ( |A.18| ) and ( |A.19[ ) into ( |A.15[ ) yields 

+00 

CMh{X\Y,\) 



^ 1 CMh{X%X) 



1 CMh{X\YA) 

2 + v^a|(X|r,A) 



1^ CMh{X\Y,\) 





p* 



2 ^al{X\Y,\) 



nXaH{X\Y, X)e-V^>^''H{x\Y,x)p 
CMh{X\Y,\) 



Q(p)- 



Q{p)- 



^al{X\Y,\) 
CMh{X\Y,\)' 



dp 



Q{p) 



2CMh{X\Y,\) 
naUX\Y,\) 

2CMh{X\Y,\) 
v/^a|(X|F,A) 



+ 



+ 



^ _^-g^^-v^A<7g(X|y,A)p 



^27r 



v/^a3,(X|F,A)J 
dp 



naUX\Y,\) 

d j^_g-V^A<7^f{X|y,A)p~j 



1 (p+^Agjj-(X|y.A))^ nA^g^(X|y,A) 

e 2 2 dp 



27r 



2CAf„(A-|y, A) ^ [Q(VSAa„(.Y|r. A)) - Q(p' + V^A.„(A-|y. A))] 



where Q{p*) = 



na|,(X|F,A) 
^H{X\Y,\,n) 

cMh{x\y,x) ^ meanwhile plugging ( |A.20| ) into ( |A.15[ ) yields 



(A.21) 



+00 

J v^Aaj,(X|r,A)e-^^'^^(^l^'^)'' 

1 _ 2CMH(X|y,A) 

2 ^^^^ v^a|,(X|y,A) 



1 _ 2CMH(X|y,A) 

2 ^^^^ v^a|,(X|F,A)J 



1 _ 2CMH(X|y,A) 

2 ^^^^ v^a|,(X|F,A) 



dp 



p. 



(_g-v^A<Ts(X|y,A)p 



^ g-^g-V^Af7H(X|y,A)p^ 



2n 



n\'^c7'jj(X\Y,\) 



Q{p, + V^XaH{X\Y,X)) 



where (^(p*) 



e^(X|y,A,n) 

1 _ 2CMjj-(X|y,A) 

2 Vna^(X|y,A) 



(A.22) 



. Combining (A.15 ) with (A.21 1 and (A.22) completes the proof 



of part (b) of Theorem [5j 
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Applying Lemma [T] to the IID sequence {-lnp{Xi\Yi) - iJ(X|F)}^^p we get ( |A.5D . This 
completes the proof of Theorem [5| ■ 



Proposition 1. When A = o(l) and A = ^l{l/^/n) as n ^ +00, we have 



e ^ QiV^XaniX, A)) = 6 



nX 



and 



^H{X\Y,\,n) = e ^ Q(v^Ao-H(X|r,A))(l + 0(A)) 

e^(X,A,n) = e 5 Q{^XaH{X\Y,X)){l-0{X)) 



(A.23) 

(A.24) 
(A.25) 



Proof: Note that A = r'-^^yi^) ~ ^i^)- When A = with respect to n, it can be 

easily verified that ^h(X\Y, X,n) and ^^(X\Y, X,n) are both on the order of by applying 
well-known inequality 

1 1 .2 1 1 ,2 

(A.26) 



Meanwhile, on one hand, it is easy to see that 



1 1 t2 , 1 1 

e"^ < Q{t) < -^=e'^. 



t V27r 



^H{X\Y,X,n)<e 
On the other hand. 



nX'^c^jj(X\Y,\) 



Q{V^XaH{X\Y,X)) + 



i^{X,X,n) = e 



n\^cri,(X\Y.X) 



nX^crjjiXlY.X) 



QiV^XaniX 



Q{^XaH{X 



nX'^<T'^^{X\Y,X) 



QiV^XaniX 



^X'^a■fJ(X\Y,X) 



> e ^ QiV^XaniX 



nX'^cr'jj(X\Y.X) 



QiV^XaniX 



nX^a-jj(X\Y,X) 



2CMh{X\Y,X) 
V^aUX\Y,X)- 



l^,A))-e 2 



(A.27) 



e 2 dp 



F,A))-e 

Y,X)) 

Y,X)) 
Y,X)) 



nX^a-jj(X\Y,X) 



27T 

1 (p+y?iA,Tg(x|y,A)) 



^Xa„{X\Y,\) 



'2tx 



dp 



p'^ + 2ps/nXcTu{X\Y,X) 



dp 



p* 



e 2 dp 

27r 



2CMh{X\Y,X) 



To further shed light on ^h{X\Y, X,n) and ^^{X\Y, X,n), we observe that 
1 



27r^/^XaHiX\Y,X) + 



y^aUX\Y,X)- 
,n), we observe 

< e "^^Q(v^Aa^^(X|y,A)) < 



iX'^a^(X\Y,X) 



(A.28) 



1 



'2^^XaH{X\Y,X) 



2TTy/^XaHiX\Y,X) 

(A.29) 
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And therefore, whenever A = o(l) and \ = uoin 



-1^ 



Q(v^A(Tj^(X, A)) = e 



which further implies 

iH{X\Y,\n) = e- 

7 

e^(X,A,n) = e- 



x'^cr'jj{X\Y,X) 



Q(v^AaH(X|r,A))(l + o(l)) 
Q(v^Aa^^(X|r,A))(l-o(l)). 



(A.30) 

(A.31) 
(A.32) 



Appendix B 

Non-asymptotic Equipartition Property with respect to Relative Entropy 
In this appendix, we establish tight upper and lower bounds on Pt^s- Once again, in light of 



the AEP with respect to relative entropy, these bounds (i.e., in ( |B.3[ )) are referred to as the NEP 
with respect to relative entropy. 

Theorem 6 (NEP With Respect to Relative Entropy). For any sequence = Xi ■ ■ ■ x„ from X, 
let t eV be the type of x^, i.e., nt{a), a E X, is the number of times the symbol a appears in 
x^. Then 

Pr <! - In ' / < lit; P) - S X" = x"^ < e-"''-^*'^) . (B.l) 
n gt(F") J 



Furthermore, under the assumptions p.l 1| ) and p.l3[ ), the following also hold: 
(a) There exists a 6* > such that for any 6 G (0, 6*] 

1 



r.{t,6) 



-6' + 0{6') 



2al{t-P) 
(b) For any 5 e {Q,/\*_{t)) 

e^ Jt;P,A,n)e— (''^) < | " 1^ ^^^^^ 



(B.2) 



where A = i*'*^-* > 0, in-it'-, P, A,^) is defined in p.l2| ), and 



dS 



{^_{t;P,X,n) = e- 



nX^rr'j^ _(t:P,\) 



Q{p, + VnXaD,^{t;P,X)) 



(B.3) 



(B.4) 



v; /n/ \ 1 2CMd -{t;P,X) 

with Q{p,) = 2 - ^^,^'_\t-,p,xy 
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(c) For any 5 < C\/^, where c < a^it; P) is a constant, 



< Q 



CMn{t;P) 



(B.5) 



aD{t;P)J ^al{t-P) 
where < C < 0.56 is the universal constant in the Berry-Esseen central limit theorem 



1231 



Proof: The inequality ( |B.1[ ) comes from the Chemoff bound. To see this is indeed the case, 
note that 

1 



Pr <^ - hi 



E 



< inf 

A>0 



p{Y"\X") 



< I{t; P)-6 

X 



:,n\{S-I{t;P)) 



n 



inf 

A>0 



inf exp < —n 
A>0 I 

-nr-{t,S) 



piy\a) 
qt{y) 



-A 



dy 



nt{a) 



an\{5-I(t;P)) 



X{6-I{t;P))-J2t{a)\n j p{y\ 



p{.y\o) 
Qtiy) 



-A 



dy 



(B.6) 



which completes the proof of (jBTTj). 

The equation ( |B.2| ) follows from the Taylor expansion of r_{t,6) at (5 = and the fact that 

d'^r_{t,6) _ 1 
d6^ ~ai{t~P) 

at (5 = . What remains is to prove ( |B.3[ ) and ( |B.5[ ). To this end, let 



/_A(2/"|x") = n/-A(2/. 



With A 



dr-(t,5) 
dS ■■ 



it follows from (3.14) that 



Xt,S) = X{S-I{t;P))-J2t{x)\n fpiyl 



X] 



. Qtiy) 



-A 



dy . 
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Then we have 

1 ij(r"|X" 



Pr<^ - In 



< I{t;P)-S 



n qt{Y^) 

j /rl(l/"k")/-A(l/1a;")p(|/"|a;")rf2/" 



/ 



/ 



" 9t(!/") 

-nr-(t,<5) 



<7(t;P)-5 



= e 



— nr_(t,<5) 



p<0 i„iOTy_„(,(,^p)_,)_ 

V»CT£, _(t;P,A) 



where 



o 



— oo 



(B.7) 



v^(jD,-(t;P, A) 

and takes values over the alphabet of Y according to the pmf or pdf (as the case may be) 
f-x{z\xi)p{z\xi). It is easy to verify that 

p{Zi\xi 



E 



In^ 



D{t,Xi,X) 
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and 



EE 



1=1 



In 



p{Zi\xi 
qt{Zi) _ 



=1 

5^t(x)D(t,a;,A) 



i=l 

n(/(t;P)-<5) 



which further implies that 



F,n(p) =Pr 



Er=i ln^-5^-/^(t,x.,A) 



A/naz) _()!:; P, A) 
Applying Lemma [T] to the independent sequence 

In — - D{t,Xi,X) 



-<P 



the argument similar to that in the proof of Theorem [5] can then be used to establish ( |B.3| ). 

Finally, consider another sequence of independent random variables Wi, W2, ■ ■ ■ , Wn, where 
Wi takes values over the alphabet of Y according to the pmf or pdf (as the case may be) p{w\xi). 
Applying Lemma [T] directly to 

^^piWi\xi) 



we then get ( |B.5[ ). This completes the proof of Theorem |6] 
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